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ABSTRACT 


This  paper  describes  a  categorization  of  data  structure 
characteristics  which  are  intrinsic  to  information  system 
environments.  An  approach  to  transferring  data  bases 
among  information  systems  based  on  these  data  structure 
levels  is  presented. 
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The  need  for  the  transfer  of  data  bases  among  information  systems  grows 
constantly  as  more  and  more  data  is  collected  and  manipulated  by  computers. 

The  introduction  of  new,  more  advanced  information  management  systems  requires 
that  duta  bases  created  and  manipulated  on  old  systems  be  reshaped  and  trans¬ 
planted  into  equivalent  data  bases  for  new  systems.  (By  "information  management 
systems,  we  mean  any  computing  environment  supporting  data  base  retrieval  and 
generation.)  This  process  is  usually  inefficient,  slow,  and  expensive, 
especially  for  large  data  bases.  The  need  for  data  base  transfer  also  arises 
in  situations  where  thfc  data  that  exist  in  one  application  system  could  be 

useful  for  a  second  application  system,  provided  that  they  could  be  restructured 
to  be  operated  on  by  the  second  system. 


The  Common  Information  Structures  project  is  attempting  to  develop  a  methodology 
a  ,d  techniques  for  restructuring  and  transferring  data  bases  across  disparate 
information  systems.  Information  system  data  structures.  In  their  physical 
organizations  and  embodiments,  reflect  the  uniqueness  of  the  system  applications. 
Each  system  employs  data  structure  organizations,  data  access  methods,  and  data 
management  functions  specifically  tailored  to  a  particular  application.  As  a 
result  one  information  system  cannot  readily  access  data  contained  in  another 
information  system,  particularly  when  the  two  systems  function  vrithin  different 
operating  environments.  Our  goal  is  to  make  wider  access  possible  by  describing 
data  bases  in  terms  of  data  structure  levels  and  transferring  data  bases  from 
one  system  to  another  through  these  levels. 


For  the  past  year,  in  pursuit  of  this  goal,  we  have  studied  .numerous  documents 

r!L f!!aS Si  structure  organization,  information  algebra,  and  functional 

properties  of  data  management  systems  (see  references)  .  This  provided  the  basis 
for  characterizing  information  structures  at  three  levels:  logical,  storage, 
and  physical.  Data  structures  at  the  logical  level  reflect  information  about 
ements  in  the  data  base,  relationships  among  them,  and  the  ordering  on  them. 

JJ  “  Stfuct^res  at  the  storage  level  reflect  access  paths  and  index  organizations, 
.  structures  are  used  by  system  designers  mostly  to  provide  time/space  effi¬ 
ciency  (such  as  partially  or  filly  inverted  organization  of  data).  Data 
structures  at  the  physical  leve’  reflect  file  and  record  organization  and  are 
intimately  connected  to  operating  systems  and  their  access  methods. 


This  characterization  of  data  structures  forms  the  basis  for  an  approach  to 

Hnn  ?“!■ .  tfansfer  in  which  data  are  transformed  from  their  physical  representa¬ 
tion  to  their  storage  representation  to  their  logical  representation  in  the 
source  system.  By  means  of  common  information  structure  techniques,  this 
logical  representation  is  transformed  into  the  equivalent  target  logical 

throuah^h ti°!-  ^  ^  ay*tem'  Finally'  the  target  logical  data  are  transformed 
through  the  storage  level  to  the  physical-level  representation  in  the  target 

ays  em.  The  techniques  involving  common  information  structures  require  a 
■1-°?ical  description  of  the  source  data  base  and  a  mapping  of  that  description 
into  an  equivalent  logical  description  of  the  target  data  base.  This  data  base 
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description  mapping  serves  as  the  basis  for  the  specification  of  a  data  bare 
transformer  that  takes  instances  of  source  data  and  transforms  them  into 
instances  of  target  data. 

2.  DATA  STRUCTURE  LEVELS 

We  feel  most  data  bases  can  be  viewed  as  collections  of  descriptions  about 
discrete  entities  which  we  call  "individuals."  These  descriptions  are  instan¬ 
tiations  of  properties  with  property  values  that  characterize  individual*.  We 
call  these  characterizing  properties  "data  elements,"  the  prop^ty  values  "data 
values,’  and  the  instantiated  descriptions  "data  itemt." 

We  believe  that  the  users  of  an  information  system  should  be  given  the  capability 
to  describe  data  bases  at  a  logical  level,  independently  of  both  the  system's 
internal  design  and  the  characteristics  of  the  computer  hardware.  Data  structure 
characteristics  that  reflect  logical  entities  and  relationships  among  these 
entities,  as  viewed  by  a  system  user,  constitute  this  logical  data  structure 
level.  Such  characteristics  include: 

(1)  specifications  of  individuals  and  data  elements 

(2)  specifications  of  type  aid  range  of  data  values 

(3)  specifications  of  data  element  relationships  with  respect  to 
individuals 

(4)  specifications  of  data  items 

(5)  grouping  and  ordering  of  data  items 

(6)  specifications  of  maps  and  relationships  between  groups  of  data 
items 

The  storage  data  structure  level  comprises  data  characteristics  which  effect 
data  inversions  and  indexing  organization  of  the  data.  System  designers  manip¬ 
ulate  storage  data  structure  characteristics  to  achieve  time/space  efficiency. 
These  characteristics  may  be  peculiar  to  a  specific  information  system  and 
include: 

(1)  facility  to  create  or  modify  data  access  paths 

(2)  specification  of  index  organization 

(3)  specifications  of  inversions  of  data 

(4)  alternative  ordering  specifications 

(5)  realization  of  data  items,  groups,  and  orderings  with  specific 
path  indicators  (pointers) 

(6)  realization  of  relationships  between  groups  of  data  item  characters 
such  as  length  and  character  set  representation 

(7)  sequencing  considerations  like  ISAM,  HSAM,  etc. 


The  physical  data  structure  level  comprises  data  characteristics  that  are 
intimately  connected  to  operating  systems.  These  characteristics  include: 

(1)  realization  of  pointers  by  addressing  techniques 

(2)  character  set  representations  (octal,  hex,  etc.)  and  computer 
word  specification  like  "byte,"  "bit,"  etc. 

(3)  blocking  specifications  for  I/O  purposes 

(4)  buffering  considerations 

(5)  channel  I/O 

(6)  specification  of  inter-record  gara 

(7)  device-dependent  access  specifications  such  as  cylinder  track 
considerations  on  disk,  blocking  factors  on  tape,  or  paging 
considerations  on  drum 

Figure  1  shows  what  might  be  typical  representations  of  data  in  the  various 
data  structure  levels.  At  the  logical  level,  the  manner  in  which  individuals 
are  characterized  by  data  items  is  shown.  The  storage  level  shows  additional 
access  and  ordering  paths  which  are  realized  through  pointers.  At  the  physical 
level,  addressing  techniques  are  used  to  realize  the  data  organization.  In 
this  representation,  the  data  items  are  linked  by  addresses  and  deciphered  by 
length  specifications. 

Data  structure  characteristics  that  are  internal  to  specific  computing  machines 
constitute  the  machine  data  structure  level.  These  characteristics  include 
device  considerations  and  hardware  constraints  that  prescribe  interfaces  between 
the  operating  system  and  the  machine.  The  MITRE  Report  [12]  on  data  manage- 
nent  systems  illustrates  that  machine- level  data  characteristics  of  different 
mudiines  are  ro  diverse  that  building  data  transformers  for  data  representa¬ 
tions  at  this  Level  is  both  unreasonable  and  unnecessary.  In  our  approach,  we 
plan  to  take  advantage  of  e> ' sting  information  system  query  and  generate  capabil¬ 
ities  to  read  source  and  write  target  data  bases.  Consequently,  the  machine 
data  structure  level,  per  se,  is  not  involved  in  the  data  transformation  process. 

The  following  list  summarizes  the  typical  characteristics  of  the  various  levels 
of  data  structures. 

•  Logical  level 

•  User's  concept  of  the  data  base 

•  Individuals 

•  Data  elements 

•  Data  values 

•  Data  items 

•  Data  element  relationships  and  links 

•  Storage  level 

•  Data  access  and  index  organization 

•  Keys,  inversions,  orderings 

9  Time/space  considerations 
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Figure  1.  Data  Transformation  Between  Data  Structure  Level* 


•  Physical  level 

•  File  organization 

•  Record  specification 

•  Operating  system  I/O 


•  Machine  level 

•  Device  considerations 

•  Hardware  constraints 

•  intimate  operating  system/ computer  interface 


implications  of  data  STRUCTURE  LEVELS  for  DATA  transfer 


Figure  2  is  an  illustration  of  a  typical  data  base  transfer, 
and  conciseness,  we  introduce  the  following  notation. 


For  convenience 


•  represents  the  i  information  system  in  an  environment  of 
n  systems  (2si^n) . 

4  LDDL(i)  denotes  the  logical  data  description  language  of 

•  LDD ( i , x )  is  a  logical  data  description  of  a  data  base  x  in  S 

expressed  in  the  language  LDDL(i).  i 

•  LD(i,x)  denotes  the  logical  data  of  data  base  x  according  to  the 
description  LDD (i , x} . 

•  ODDL(i),  SDD (i ,x) ,  SD(i,x),  PDDL(i),  PDD(i,x)  andPD(i.x)  are 
defined  analogously  for  the  storage  and  physical  data  structure 
levels,  respectively. 


We  read  data  base 


x  in  S  and  extract  PD(i,x). 
how  the  PD(i,x)  is  organized,  is  also  created. 


PDD(i ,x),  a  specification  of 

-  - -  A  transformation  is  now 

lilt* nB®d  on  PD (i'x)  to  make  SD(i,x).  During  this  transformation,  operating 
system  (physical- level)  data  characteristics  are  removed.  The  SDD(i,x) 
created  describes  how  the  SD(i,x>  it  organized.  The  data  transformer-  that 
tracts  LD(i,x)  from  SD(i,x)  removes  information-system-dep-n  Je..t  (storage- 
level)  characteristics.  If  we  restrict  our  attention  to  (or  create)  the  rare 
i"forTnation  systems  that  have  common  logi -al  data  characteristics,  the: 
S  LD(i'x)+LD(3'x>  is  trivial  -re  usual  case,  however,  will 

e  handled  by  a  data  transformer  that  maps  instances  of  LDD ( i , x )  to  equivalent 

LDD(J£\  ?nC\LD0(j'X)  and  “><*'*>  obtained,  transformation 
will  be  performed  which  invoke  necessary  information  system  (storage-level)  am 
operating  system  (physicrl-level)  data  characteristics  ko  that  the  target 
physical  data  (PD(j,x))  will  be  an  efficient  representation  for  the  target 


system  Sy  The  final  step  is  to  write  PD(j,x)  as  data  base  x  on  S 


r 


In  summary,  the  data  structure  levels  are  used  to  modularize  the  data  trans- 
fo  nation  process.  Three  data  description  languages  are  defined:  logical. 
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Figure  2.  Data  Base  Transfer  through  Data  Structure  Levels 


storage,  and  physical.  Each  is  a  formalism  to  describe  the  relationships  between 
its  corresponding  data  structures  and  date  Data  are  transformed  from  physical 
to  storage,  storage  to  logical,  logical  to  storage,  and  storage  to  physical  data 
structure  levels  by  data  transformers. 

4 .  A  SHORT-RANGE  APPROACH  TO  DATA  TRANSFER 

Transferring  data  through  the  various  data  structure  levels  requires  a  preliminary 
data  transformer  design  specification,  a  data  description  language  specification, 
and  implementation  of  data  transformers  for  each  level.  Information  system  users 
ire  primarily  interested  in  accessing  data — not  in  tie  technical  elegance  of  the 
access  procedures.  There  are  numerous  situations  in  which  a  user  might  want  to 
transfer  an  fxisting  data  base  from  one  sys  'm  ,_o  another  only  oncer  when  his 
information  system  becomes  obsolete  and  is  .  eplaced  with  an  updated  version,  when 
he  changes  his  hardware  configuration,  when  he  wishes  to  change  information 
systems  for  efficiency  considerations,  when  he  wishes  to  have  his  data  accessible 
on  a  different  manufacturer's  machine,  etc.  It  would  be  unreasonable  in  these 
situations  to  put  a  large  effort  into  autonating  transformers  or  designing  new 
data  structures  for  his  application,  since  he  will  only  do  it  once.  Consequently, 
we  are  addressing  data  base  transfer  in  an  evolutionary  manner.  We  jure  basing 
our  transfer  process  on  logical  level  techniques.  We  will  transform  a  Bource 
data  base  from  the  information  system  environment  to  the  logical  level  by  "hand- 
tailored"  mechanisms.  We  will  then  transfer  this  logical  data  from  source  system 
to  target  system  with  logical  data  structure  techniques.  The  target  system 
logical  data  will  be  written  into  the  target  system  environment  in  an  expedient 
manner. 

These  techniques  can  then  be  refined  and  embellished  in  a  longer-range  plan 
where  the  transformation  between  levels  is  effected  in  an  efficient  and  elegant 
manner,  taking  full  advantage  of  data  characteristics  at  the  three  data 
structure  levels. 


We  introduce  a  few  additional  notations  to  precisely  express  the  tasks  involved: 

•  FDL(i)  represents  the  file  description  mechanism  (as  it  exists)  in 
Si* 

•  LDDL(i)  denotes  the  logical  subset  of  FDL(i)  which  is  relevant  to 
the  data  transfer  process. 

•  The  set  of  all  possible  data  base  descriptions  LDD(i,x)  in  system 
S^,  is  denoted  LDD(i). 

•  A  data  description  mapping,  which  specifies  for  every  instance  of 
LDD(i)  an  equivalent  instance  of  LDD(j),  is  called  M^. 


We  envision  the  data  transfer  process  to  begin  with  the  derivation  of  an 
LDD (i»x)  in  the  language  LODL(i).  The  data  description  mapping  M.  ,  will  then 
specify  an  equivalent  LDD(j,x)  given  this  LDD(i,x).  The  logical  dita  LD(i,x) 
will  be  extracted  from  data  base  x  using  the  query  capabilities  of  S:  and  the 
DD.i,x).  The  data  transformer  T^j(x)  will  transform  instances  of  LD(i,x)  to 
instances  of  LD(j,x),  which  will  then  be  written  in  Sj  using  the  S.  file 
generate  capabilities  and  the  LDD(j,x).  The  specific  tasks  and  ou£  approaches 
to  them  are  outlined  below. 
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4.1  ANALYSIS  OF  FDLs 


In  order  to  gain  insight  into  logical  data  structures,  we  will  analyze  existing 
systems'  file  description  languages  (FDLs)  and  logical  data  structures  as 
implied  by  those  languages.  We  will  concentrate  on  systems  that  are  representa¬ 
tive  of  diverse  application  areas  in  both  their  logical  data  structures  and 
their  internal  structure  characteristics.  The  systems  that  we  plan  to  study 
include:  DS/3— an  SDC  hierarchical  data  management  system;  IMS — a  widely  used 
IBM  system  that  allows  network  data  structures  in  addition  to  hierarchies;  and 
the  Datacomputer — the  ARPANET  data  management  facility. 

4.2  DEFINITION  OF  LDDLs 


We  expect  that  many  details  expressed  in  the  FDLs  are  reflections  of  physical 
characteristics  of  the  information  systems  that  do  not  need  to  be  reflected  in 
the  LDDLs.  For  each  system  S  ,  therefore,  we  will  isolate  or  define  its  LDDh(i) 
from  the  file  description  mechanisms  expressed  in  its  FDL. 
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4.3  ANALYSIS  OF  LDDL  CHARACTERISTICS 


We  will  analyze  the  different  LDDLs  in  order  to  isolate  logical  structure 
characteristics  such  as  hierarchical  or  relational  structures,  depth  of 
hierarchical  levels,  and  size  of  relations.  The  remaining  LDDL  character¬ 
istics,  such  as  vhe  type,  f  ize,  and  format  of  fields,  are  data  dependent. 
This  separation  of  logical  strvccures  and  data-dependent  properties  of  LDDLs 
is  important  because  the  logical  structures  the  basis  on* which  to  define 
the  mappings  of  LDDLs,  and  the  data  dependent  properties  are  necessary  for 
the  creation  of  an  efficient  data  base  organization  in  the  target  system. 

4.4  CLASSIFICATION  OF  DATA  STRUCTURE  COMPONENTS 


Different  existing  systems'  logical  data  structures  need  to  be  classified  with 
regard  to  such  things  as  terminology  (e.g.,  "table"  and  "relation"  are  sometimes 
used  synonymously),  type  (e.g.,  numeric,  alphabetic),  and  intrinsic  constraints 
(e.g.,  levels  of  hierarchies).  This  information  is  vital  in  order  to  recognize 
the  commonality  and  differences  among  various  systems'  logical  data  structures. 


For  example,  the  CODASYL  Task  Group  [7]  attempts  to  allow  COBOL  system  users  to 
share  data  bases  by  "tailoring"  a  data  base  substructure  for-  each  user  wishing 
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to  access  the  data.  The  schema  and  subschema  in  this  approach  are  forms  of 
logical  data  descriptions.  Terms  like  "groups,"  "arrays,"  "plexes,"  "trees," 
etc.,  are  used  to  describe  data  structures.  DIAM  [4,5]  uses  "A-strings," 
"E-stringp,"  and  "L-strings"  to  describe  similar  data  structures. 

We  will  correlate  the  terminologies  of  the  various  systems  so  as  to  minimize 
the  number  of  classifications  of  logical  data  structures  as  well  eus  utilize 
the  underlying  conronality  in  our  specification  processes.  The  isolation  of 
types  of  data  structures  will  be  used  to  determine  the  feasibility  of  mapping 
logical  structures  between  systems.  Intrinsic  constraints  like  "levels"  of 
hierarchy  have  serious  implications  for  the  complexity  of  the  data  transfer 
process.  For  example,  suppose  that  there  exists  a  data  description  mapping  M  . 
that  maps  LDD(i)-*LDD(  j)  .  Further,  suppose  that  systems  Sj  and  S.  allow  hier-  3 
archical  data  structures  of  depth  5  and  10,  respectively.  Then  in  the  case  of 
the  need  to  transfer  a  data  base  with  10  levels  on  Sj  into  a  combination  of 
hierarchies  of  5  levels  or  less  on  S^,  we  need  to  isolate  the  logical  relation¬ 
ship  portion  of  the  LDD(i)  and,  working  within  this  framework,  map  the  hier¬ 
archical  structures  of  Sj  into  equivalent  ones  on  S^. 

4.5  DATA  DESCRIPTION  MAPPING  SPECIFICATION 

Two  alternative  approaches  to  specifying  data  description  mapping  will  be 
considered: 

(a)  Let  S£  and  Sj  ^present  the  1th  and  systems  (i*j)  in  an 
environment  of  n  information  systems.  There  are  n(n-l) 
mappings  (H,  ,)  necessary  in  order  to  map  every  LDD.(i)  to 
every  LDD(j).  Two  mappings  are  necessary  for  every  pair  i,j: 


LDD(i) 


LDD(j) 


LDD(j) 


LDD(i) 


In  most  instances,  M^j  is  not  the  same  as  Mj^,  since  data 
structure  types  and  constraints  will  differ  between  the  systems. 
However,  we  should  be  able  to  take  advantage  of  any  commonalities 
found  (from  the  studies  in  previous  steps)  in  the  generation  of 
the  M^j 's. 
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(b)  An  alternative  (and  probably  mora  efficient)  way  to  achieve  mappings 
between  the  data  descriptions  Is  by  identifying  the  commonality 
instrinsic  to  logical  data  structures  and  logical  data  descriptions 
and  defining  i  common  logical  data  description  (CLDD)  and  its 
corresponding  language  (CLDD! , .  We  will  explore  the  possibility 
of  making  the  Cl  DDL  some  set- theoretic  formulation  of  ell  the  DDLs. 

Let  Mic  be  a  data  description  mapping  from  LDD(i)  to  c  CLDD.  Two 
typical  mappings  are  necessary  for  systems  and  S  : 


M 


ic 


cj 


LDD(i) 


.  ldd ( j ) 


LDD(j) 


-  CLDD 

LDD(i) 


In  this  approach,  only  2n  mappings  are  necessary.  However,  this 
method  requires  specification  of  a  CLDDL,  whereas  the  first  approach 
requires  no  such  preliminary  step.  The  more  commonality  we  find  in 
the  logical  data  structures,  the  more  efficient  the  second  approach 
is. 


4.6  DATA  BASE  TRANSFER 

For  the  purpose  of  testing  the  ideas  and  techniques  for  data  transfer,  a 
preliminary  experiment  is  necessary.  We  plan  to  extract  LD(i,x)  from  the  data 
base  x  in  with  hand-tailored  mechanisms  based  on  S^'s  query  capability  and 
the  data  base  description  LDD(i,,x).  We  will  build  the  data  base  transformer 
Tm  (x)  ,  which  transforms  LD(i,x)  into  ID(j,x)  based  on  the  LDD(i)-*LDD( j)  spec¬ 
ifications.  Ti;j(x)  will  then  transform  LD(i,x)  into  LD(j,x),  which  will  be 
written  into  Sj  with  S^'s  file  generation  capabilities  and  the  LDD ( j ,x)  by 
similar  methods.  Possxbly  more  elegant  techniques  could  be  explored  at  a  later 
time,  after  the  basic  information  transfer  techniques  are  achieved. 


5. 


EXTENSIONS 


We  envision  three  directions  which  could  lead  tc  more  automatic  and  efficient 
techniques  fo*-  achieving  data  transfer. 

x*  Automatic  generation  of  data  description  mapg.  We  expect  that  while 
we  define  specific  mappings,  we  will  find  to 51s  that  can  be  realized 
by  meta-compiler  techniques .  This  process  could  lead  to  the  automatic 
generation  of  data  description  maps,  in  particular,  we  believe  that 
the  oommon  logical  data  description  (nLDD)  concept  will  be  useful. 

2-  Automatic  extraction  and  generation  of  data  bases.  This  includes  the 
development  of  a  methodology  for  extracting  data  from  and  inserting 
data  into  data  bases  using  data  descriptions .  We  expect  to  take 
advantage  of  existing  generate  and  retrieval  mechanisms  in  current 
information  systems.  Research  in  this  direction  could  lead  to  a 
study  of  levels  of  data  structures  within  information  systems,  and 
their  use  in  the  process  of  transforming  physical  data  into  logical 
data. 

3'  Automatic  transformation  of  data  bases  using  the  data  description  mans. 
This  includes  the  development  of  more  efficient  methods  of  manipulating 
data  during  the  transfer  process,  such  as  (1)  buffering  and  (2)  control 
of  data  flow  from  the  source  system  to  the  target  system. 


6. 
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